Summary

The dataset is a collection of data on the last 20000 seismic events recorded by the USGS and a few other networks before 2024. It contains data on each event’s location and magnitude, as well as very limited data on stations used to detect the earthquake. It was obtained from the USGS earthquake catalogue (https://earthquake.usgs.gov/earthquakes/search/) and its documentation can be found linked onthat page.

The main purpose of this project is to allow me to pass this class and learn how to program in R, though the main question I hope to answer with it is “how frequently do major (magnitude at least 7.5) earthquakes occur?”

The data is potentially insufficient for that as there are only 49 such earthquakes present in the dataset. However, this could be corrected by adding an additional 20000 earthquakes prior to the first datapoint present in the dataset (or more, 20000 is simply the maximum number of datapoints I can retrieve at once).

I have already addressed one of the interesting things about the dataset (the presence of nuclear explosions as sources for seismic activity), but I never addressed the presence of volcanic activity within the dataset. Why are the locations for these datapoints constrained to only two locations?

ggplot() + 
  geom_map(data = map_data("world"), map = map_data("world"), aes(map_id = region)) +
  geom_polygon(fill="white", colour = "black") +
  geom_point(size = 0.5, data = quakes %>% filter(type == "volcanic eruption"), mapping = aes(x = longitude, y = latitude), color="red") +
  geom_point(alpha=0, data = quakes %>% filter(type == "earthquake"), mapping = aes(x = longitude, y = latitude)) #Yes, this serves a purpose.

Another interesting thing is the presence of earthquakes a significant distance away from where tectonic plate boundaries are. Some seem to occur in locations in China, Russia and northern North America, in locations that do not line up with plate boundaries.

ggplot() + 
  geom_map(data = map_data("world"), map = map_data("world"), aes(map_id = region)) +
  geom_polygon(fill="white", colour = "black") +
  geom_point(size=0.1, data = quakes %>% filter(type == "earthquake"), mapping = aes(x = longitude, y = latitude))

My plan moving forwards is simply to continue working on things as necessary, though it may be worth looking into a possible relationship between location and magnitude, and it may also be worth adding in additional data.

Initial Findings

1 - Do higher-magnitude earthquakes occur more frequently at sea or near land?

ggplot() + 
  geom_map(data = map_data("world"), map = map_data("world"), aes(map_id = region)) +
  geom_polygon(fill="white", colour = "black") +
  geom_point(size = 0.0125, data = quakes, mapping = aes(x = longitude, y = latitude, color = mag))

2 - What is the average time between major earthquakes across the world?

data <- quakes[order(quakes$time),] %>% filter(mag > 7.5)
data <- data %>% mutate(n = seq.int(nrow(data))) %>% mutate(timeSinceLastHrs = (time - lag(time))/3600)
ggplot() + geom_point(data = data, aes(x = n, y = timeSinceLastHrs))
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).