Introduction to earthquakesWithR

This project was centered around a dataset obtained from the U.S. National Oceanographic and Atmospheric Administration (NOAA) on significant earthquakes around the world. This dataset contains information about 5’952 earthquakes over an approximately 4,000 year time span (data downloaded on the 09.07.2017). The project was the capstone of the course Mastering Software Development in R from Coursera.
The end result of this project is a fully functioning package on GitHub (earthquakesWithR) which can be installed as follows:

# install.packages("devtools")
devtools::install_github("moralmar/earthquakesWithR")

The following shall give you an impression what the package can do and where the limits are. We are not going to examine in detail the code here, however, a brief overview of the package itself is:

  • Earth quakes and Timeline - plots a special timelines
  • Earth quakes and Leaflet map - creates an interactive map

and the most important functions (which we will see in the code below), are:

  • geom_timeline() - creates a specific geom for the ggplot2 package
  • geom_timeline_label() - addes labels (top “n” records in magnitudes) to the geom above
  • eq_clean_data() - a function which loads and applies a few cleaning function to the NOAA data set
  • eq_map() - creats a leaflet map and visualizing the earth quakes
  • eq_create_label() - is a helper function, creating a nice pop-ups in the leaflet map

Generally, to create a timeline or an interactive map, we only need to provde two inputs:

  • the country
  • and the date interval of interest

There are a few additional options, but which mainly serve graphical purposes and are not discussed in detail here.
So let’s start with some simple examples and going from there into more complex ones. We’ll start with the timeline functions. And then we’ll take the same data and plug them into the leaflet map.

# required packages
require(earthquakesWithR)

require(magrittr) # for piping operator
require(dplyr) # for filter/subsetting functions
require(ggplot2) # should be obvious :)
require(ggthemes) # for the theme_classic()

Timelines

Let’s start with a few examples with just one and three countries.

One Country - Mexico (without labels)

# Load the data and filter it accordingly to your needs/inputs
quakes <- eq_clean_data()
#> Warning: package 'bindrcpp' was built under R version 3.3.3
quakes_filtered_1c <- quakes %>%
        filter(COUNTRY %in% c("MEXICO") & YEAR >= 2000)

# single country without labels
timeline1 <- ggplot() +
        geom_timeline(data = quakes_filtered_1c, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
        theme_classic()
timeline1

One Country - Mexico (with labels)

# and now with some labels
timeline1 + geom_timeline_label(data = quakes_filtered_1c, aes(x = date,
                                                 y = COUNTRY,
                                                 magnitude = EQ_PRIMARY,
                                                 label = LOCATION_NAME,
                                                 n_max = 3))

Three Countries - Philippines, Indonesia and Papua New Guinea

Now let’s see how this looks with three countries (we’ll directly add now the labels).
You’ll notice that I added a few additional lines of code for the ggplot. This is merely to make the plot more attractive

# Load the data and filter it accordingly to your needs/inputs
quakes <- eq_clean_data()
quakes_filtered_3c <- quakes %>%
        filter(COUNTRY %in% c("PHILIPPINES", "INDONESIA", "PAPUA NEW GUINEA") & YEAR >= 1900)

# three country with labels
ggplot() +
        geom_timeline(data = quakes_filtered_3c, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
        geom_timeline_label(data = quakes_filtered_3c, aes(x = date,
                                                 y = COUNTRY,
                                                 magnitude = EQ_PRIMARY,
                                                 label = LOCATION_NAME,
                                                 n_max = 5)) +
        theme_classic() +
        # Added code for attractiveness
        theme(legend.position = "bottom", legend.direction = "horizontal",
              axis.line.y = element_blank()) +
        labs(size = "Magnitude ", color = "Deaths ")

Multiple Countries - Europe

And what about Europe? Well, we clearly see here the first limitation of the package: the labels don’t always fit into the graph and the more countries are shown, the more difficult it gets to see all labels. Further formating of the label text would be necessary.

# These are the European countries. I hope I didn't miss any
Europe <- c("AZORES (PORTUGAL)", "ALBANIA", "ANDORRA", "AUSTRIA", "BELARUS", "BELGIUM", "BOSNIA-HERZEGOVINA", "BOSNIA", "HERZEGOVINA2", "BULGARIA", "CROATIA", "CYPRUS", "CZECH REPUBLIC", "DENMARK", "FINLAND", "FRANCE", "GEORGIA", "GEROMANY", "GREECE", "HUNGARY", "ICELAND", "IRELAND", "ITALY", "KOSOVO", "LIECHTENSTEIN", "LUXEMBOURG", "MACEDONIA", "MALTA", "MOLDOVA", "MONACO", "MONTENEGRO", "NETHERLANDS", "NORWAY", "POLAND", "PORTUGAL", "ROMANIA", "SAN MARINO", "SERBIA", "SERBIA AND MONTENEGRO", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TURKEY", "UKRAINE", "UK", "UK TERRITORY", "VATICAN CITY")


quakes_EU <- NOAAearthquakes %>%
        filter(COUNTRY %in% Europe & YEAR >= 2000)

# Europe with lables
ggplot() +
        geom_timeline(data = quakes_EU, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
        geom_timeline_label(data = quakes_EU, aes(x = date,
                                                 y = COUNTRY,
                                                 magnitude = EQ_PRIMARY,
                                                 label = LOCATION_NAME,
                                                 n_max = 1)) +
        theme_classic() +
        # Added code for attractiveness
        theme(legend.position = "bottom", legend.direction = "horizontal",
              axis.line.y = element_blank()) +
        labs(size = "Magnitude ", color = "Deaths ")

Interactive Leavlet Map

We already have three filtered data sets containing the countries of interest. Let’s see how the package is visualizing them via the leaflet map

Mexico - from 2000 until today

(!!) Don’t forget to klick on the data points and explore the map by zooming further in (!!)

require(leaflet)
require(sp) # to handle the spatial polygon

quakes_filtered_1c %>%
        dplyr::mutate(popup_text = eq_create_label(.)) %>%
        eq_map(annot_col = "popup_text")

Philippines, Indonesia and Papua New Guinea - from 2010 until today

You’ll notice that the interval of the date is changed. Going from 1900 until today, I had too many data points which overloaded the map.

quakes_filtered_3c %>%
        dplyr::filter(lubridate::year(date) >= 2010) %>% # additional subset, or else too many points
        dplyr::mutate(popup_text = eq_create_label(.)) %>%
        eq_map(annot_col = "popup_text")

Europe - from 2010 until today

Same hold here: the range of the dates had to be ajusted.
(!!) Important to note here is that though we defined Europe to be the countries of interest, only countries which actually had events are marked with a polygon (!!)

quakes_EU %>%
        dplyr::filter(lubridate::year(date) >= 2010) %>% # additional subset, or else too many points
        dplyr::mutate(popup_text = eq_create_label(.)) %>%
        eq_map(annot_col = "popup_text")

We can see here that there is a small flaw: the polygon is not being mapped to every country which had an event. It seems that in the region of Serbia, there are some countries not properly matched (matching is happening by name, hence, a special character might be involved there).

Conclusion

The package earthquakesWithR provides very interesting functions which helps to easily create timelines with and without labels. And of course the interactive leaflet maps which gives very fast a very nice overview of the spatial situation (including labels of the data points).
Further work includes:

  • Correcting the country names in all data sets such that the polyons are being correctly mapped
  • Formating the labels in the timeline such that the labels are not being cut-off by the graphic border
  • Formatting the data point in the interactive map such that they represent the actual earth quake better, e.g. with
    • multiple rings depending on magnitude
    • size of point should represent the actual area of influence
    • maybe representing the data point with a circle is not correct, maybe an ellipse might be more adequate (I have in mind the geosphere package - function: destPoint(), reference example done in geom_hurricane)