earthquakesWithRThis project was centered around a dataset obtained from the U.S. National Oceanographic and Atmospheric Administration (NOAA) on significant earthquakes around the world. This dataset contains information about 5’952 earthquakes over an approximately 4,000 year time span (data downloaded on the 09.07.2017). The project was the capstone of the course Mastering Software Development in R from Coursera.
The end result of this project is a fully functioning package on GitHub (earthquakesWithR) which can be installed as follows:
# install.packages("devtools")
devtools::install_github("moralmar/earthquakesWithR")The following shall give you an impression what the package can do and where the limits are. We are not going to examine in detail the code here, however, a brief overview of the package itself is:
and the most important functions (which we will see in the code below), are:
ggplot2 packageleaflet map and visualizing the earth quakesGenerally, to create a timeline or an interactive map, we only need to provde two inputs:
There are a few additional options, but which mainly serve graphical purposes and are not discussed in detail here.
So let’s start with some simple examples and going from there into more complex ones. We’ll start with the timeline functions. And then we’ll take the same data and plug them into the leaflet map.
# required packages
require(earthquakesWithR)
require(magrittr) # for piping operator
require(dplyr) # for filter/subsetting functions
require(ggplot2) # should be obvious :)
require(ggthemes) # for the theme_classic()Let’s start with a few examples with just one and three countries.
# Load the data and filter it accordingly to your needs/inputs
quakes <- eq_clean_data()
#> Warning: package 'bindrcpp' was built under R version 3.3.3
quakes_filtered_1c <- quakes %>%
filter(COUNTRY %in% c("MEXICO") & YEAR >= 2000)
# single country without labels
timeline1 <- ggplot() +
geom_timeline(data = quakes_filtered_1c, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
theme_classic()
timeline1# and now with some labels
timeline1 + geom_timeline_label(data = quakes_filtered_1c, aes(x = date,
y = COUNTRY,
magnitude = EQ_PRIMARY,
label = LOCATION_NAME,
n_max = 3))Now let’s see how this looks with three countries (we’ll directly add now the labels).
You’ll notice that I added a few additional lines of code for the ggplot. This is merely to make the plot more attractive
# Load the data and filter it accordingly to your needs/inputs
quakes <- eq_clean_data()
quakes_filtered_3c <- quakes %>%
filter(COUNTRY %in% c("PHILIPPINES", "INDONESIA", "PAPUA NEW GUINEA") & YEAR >= 1900)
# three country with labels
ggplot() +
geom_timeline(data = quakes_filtered_3c, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
geom_timeline_label(data = quakes_filtered_3c, aes(x = date,
y = COUNTRY,
magnitude = EQ_PRIMARY,
label = LOCATION_NAME,
n_max = 5)) +
theme_classic() +
# Added code for attractiveness
theme(legend.position = "bottom", legend.direction = "horizontal",
axis.line.y = element_blank()) +
labs(size = "Magnitude ", color = "Deaths ")And what about Europe? Well, we clearly see here the first limitation of the package: the labels don’t always fit into the graph and the more countries are shown, the more difficult it gets to see all labels. Further formating of the label text would be necessary.
# These are the European countries. I hope I didn't miss any
Europe <- c("AZORES (PORTUGAL)", "ALBANIA", "ANDORRA", "AUSTRIA", "BELARUS", "BELGIUM", "BOSNIA-HERZEGOVINA", "BOSNIA", "HERZEGOVINA2", "BULGARIA", "CROATIA", "CYPRUS", "CZECH REPUBLIC", "DENMARK", "FINLAND", "FRANCE", "GEORGIA", "GEROMANY", "GREECE", "HUNGARY", "ICELAND", "IRELAND", "ITALY", "KOSOVO", "LIECHTENSTEIN", "LUXEMBOURG", "MACEDONIA", "MALTA", "MOLDOVA", "MONACO", "MONTENEGRO", "NETHERLANDS", "NORWAY", "POLAND", "PORTUGAL", "ROMANIA", "SAN MARINO", "SERBIA", "SERBIA AND MONTENEGRO", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TURKEY", "UKRAINE", "UK", "UK TERRITORY", "VATICAN CITY")
quakes_EU <- NOAAearthquakes %>%
filter(COUNTRY %in% Europe & YEAR >= 2000)
# Europe with lables
ggplot() +
geom_timeline(data = quakes_EU, aes(x = date, y = COUNTRY, color = TOTAL_DEATHS, size = EQ_PRIMARY)) +
geom_timeline_label(data = quakes_EU, aes(x = date,
y = COUNTRY,
magnitude = EQ_PRIMARY,
label = LOCATION_NAME,
n_max = 1)) +
theme_classic() +
# Added code for attractiveness
theme(legend.position = "bottom", legend.direction = "horizontal",
axis.line.y = element_blank()) +
labs(size = "Magnitude ", color = "Deaths ")Leavlet MapWe already have three filtered data sets containing the countries of interest. Let’s see how the package is visualizing them via the leaflet map
(!!) Don’t forget to klick on the data points and explore the map by zooming further in (!!)
require(leaflet)
require(sp) # to handle the spatial polygon
quakes_filtered_1c %>%
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")You’ll notice that the interval of the date is changed. Going from 1900 until today, I had too many data points which overloaded the map.
quakes_filtered_3c %>%
dplyr::filter(lubridate::year(date) >= 2010) %>% # additional subset, or else too many points
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")Same hold here: the range of the dates had to be ajusted.
(!!) Important to note here is that though we defined Europe to be the countries of interest, only countries which actually had events are marked with a polygon (!!)
quakes_EU %>%
dplyr::filter(lubridate::year(date) >= 2010) %>% # additional subset, or else too many points
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")We can see here that there is a small flaw: the polygon is not being mapped to every country which had an event. It seems that in the region of Serbia, there are some countries not properly matched (matching is happening by name, hence, a special character might be involved there).
The package earthquakesWithR provides very interesting functions which helps to easily create timelines with and without labels. And of course the interactive leaflet maps which gives very fast a very nice overview of the spatial situation (including labels of the data points).
Further work includes:
geosphere package - function: destPoint(), reference example done in geom_hurricane)