Week 6 data dive

data = drop_na(quakes %>% select("mag", "magError")) %>% mutate(magMax = mag + magError)
ggplot(data, aes(x = magMax, y = magError)) + geom_point()

Lower magnitude earthquakes seem to be more likely to have a higher degree of uncertainty associated with measurements of their magnitude. This could potentially be a result of fewer stations detecting these earthquakes, the methods for determining magnitude being less reliable at lower magnitudes, or a combination of both.

cor(data$magMax, data$magError, use = "complete.obs", method="pearson")

## [1] -0.1077151

The correlation coefficient being positive doesn’t make much sense at first glance as there is a fairly clear downwards trend, however, with the visualization used it is impossible to tell just how many values there are in some places on the graph (in particular, towards the bottom-left corner), and there may be enough datapoints there to outweigh the downwards trend.

data2 = drop_na(quakes %>% select("dmin", "depthError")) %>% mutate(depthRange = 2 * depthError)
ggplot(data2, aes(x = dmin, y = depthRange)) + geom_point()

It doesn’t seem like there’s much of a correlation between the depth error (or range) and the minimum horizontal distance. Much of the data exists on a horizontal line.

cor(data2$dmin, data2$depthRange, use = "complete.obs", method="pearson")

## [1] -0.1614855

The correlation coefficient is likely negative due to the sheer number of datapoints between dmin=0 and dmin=10.

ggplot(data = drop_na(quakes %>% select("magNst", "magError")), aes(x = magNst, y = magError)) + geom_point()

The error in measured magnitude decreases seemingly exponentially as the number of stations used to calculate the magnitude increases.

cor(quakes$magNst, quakes$magError, use = "complete.obs", method="spearman")

## [1] -0.7578848

This value is higher than expected, though there are a substantial amount of datapoints that do not line up with the general trend.

errorsNoNA <- drop_na(quakes %>% select("magError"))
errorMean = mean(errorsNoNA %>% pluck("magError"))
errorSE = sd(errorsNoNA %>% pluck("magError")) / sqrt(length(errorsNoNA))
c(errorMean - 2.807*errorSE, errorMean + 2.807*errorSE)

## [1] -0.0118588  0.1425854

The maximum error in measured magnitude for an earthquake of magnitude at least 5 seems to be between 0 and 0.143 approximately 99.5% of the time. (Given that it is a value measuring an error, it cannot be negative.) This range may be different from what would be expected of an earthquake of any magnitude including those less than 5.

This is not unexpected, as the average number of stations which are used to calculate an earthquake’s magnitude seems to be around 85, and the magnitude error among all datapoints in the dataset doesn’t exceed 0.15 until the number of stations used drops below approximately 60 (and doesn’t exceed 0.15 excluding outliers until the number of stations drops below 20-25).

Week 6 data dive

Connor McNally

2024-10-08