week8

The obvious response variable for this dataset is earthquake magnitude.

Unfortunately, all of the categorical columns of note either are unrelated to the event itself, or should not have an impact on the magnitude - or in the case of event type, lack sufficient data on anything other than earthquakes to draw any conclusions other than those along the lines of “no major seismic events were caused by volcanic eruptions in the past 12 years” or “major seismic events caused by volcanic eruptions are rare (or potentially nonexistent)”.

summary(aov(mag ~ type, quakes)) #There simply aren't enough volcanic eruptions or nuclear explosions in this dataset to draw useful conclusions from what is present.

##                Df Sum Sq Mean Sq F value Pr(>F)
## type            2      0  0.1014     0.6  0.549
## Residuals   19997   3381  0.1691

As mentioned above, there are very few instances of non-earthquake events. This, combined with the data’s bias towards lower-magnitude earthquakes and the lack of events with magnitude less than 5, means there is not enough data to draw conclusions about mean magnitude for each type of event - it would not be unreasonable to assume that a very large number of events are excluded due to the data’s lower bound on magnitude.

There also aren’t any continuous (or integer) columns which form a linear relationship with magnitude, at least within the dataset. There may be factors that play a role in determining magnitude that are not contained within this data, however.

lm(mag ~ depth, quakes)

## 
## Call:
## lm(formula = mag ~ depth, data = quakes)
## 
## Coefficients:
## (Intercept)        depth  
##   5.3227163    0.0003696

lm(mag ~ magNst, quakes)

## 
## Call:
## lm(formula = mag ~ magNst, data = quakes)
## 
## Coefficients:
## (Intercept)       magNst  
##   5.3282654   -0.0005329

lm(mag ~ gap, quakes)

## 
## Call:
## lm(formula = mag ~ gap, data = quakes)
## 
## Coefficients:
## (Intercept)          gap  
##    5.571827    -0.004006

lm(mag ~ dmin, quakes)

## 
## Call:
## lm(formula = mag ~ dmin, data = quakes)
## 
## Coefficients:
## (Intercept)         dmin  
##    5.348565    -0.001791

lm(mag ~ rms, quakes)

## 
## Call:
## lm(formula = mag ~ rms, data = quakes)
## 
## Coefficients:
## (Intercept)          rms  
##      5.1950       0.1744

quakes %>% ggplot(mapping = aes(x = mag, y = rms)) + geom_point() + geom_smooth(method="lm")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 116 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 116 rows containing missing values or values outside the scale range
## (`geom_point()`).

The coefficients for most of these relationships are very low - as expected, given that none of the columns demonstrate a clear (or even not-so-clear) linear relationship with magnitude.

The model using ‘rms’ has the highest coefficient but it’s hard to say whether there’s actually a relationship there or not, given how biased towards lower magnitudes the data is.

week8

Connor McNally

2024-10-22