An earthquake (quake) is the shaking of the surface of the Earth resulting from a sudden release of energy in the Earth’s lithosphere that creates seismic waves. The word earthquake is used to describe any seismic event that generates seismic waves. There are many things that can caused earthquakes (such as nuclear explosions), but they are mostly caused by the rupture of geological faults. The hypo-center is the point of initial rupture and the epicenter is the location of the quake at surface level. The seismic activity of an area is the frequency, type, and size of earthquakes experienced over a particular time. Earthquakes tend to happen most commonly at the tectonic plate boundaries.
Earthquakes occur where there is sufficient stored elastic strain energy to drive fracture propagation along a fault plane. Once the fault has locked, continued relative motion between the plates leads to increasing stress and, therefore, stored strain energy in the volume around the fault surface. This continues until the stress has risen sufficiently to break through the asperity, suddenly allowing sliding over the locked portion of the fault, releasing the stored energy.
There are three main types of fault: normal, reverse (thrust), and strike-slip.
The energy released by earthquakes increases, thirty-fold times, every unit increase in magnitude. This happens because the energy release in an earthquake is proportional to the area of the fault that ruptures and the stress drop.
Earthquakes can be part of a sequence or cluster (in close spatial and temporal proximity) of smaller or bigger earthquakes (foreshocks and aftershocks), related to each other in terms of location and time. After an earthquake (main shock) another earthquake happens, called aftershock, and, sometimes, smaller earthquakes (foreshocks) are precursors to the main shock. Whenever an amount of earthquakes strike in an area (each earthquake close to each other) withing a short time period, it is called a Swarm of earthquakes. These earthquakes have similar magnitudes.
Earthquakes with greater magnitude tend to be less frequent than those of lesser magnitudes. This may be show by a Poisson distribution.
In this section, we will upload our data and clean it. We will also create a map of Japan which we will later use to plot our earthquakes and see the locations that were most affected. This data set contains all the earthquakes that happened around the world since 1906.
## Classes 'data.table' and 'data.frame': 283132 obs. of 22 variables:
## $ time : POSIXct, format: "2023-02-26 23:58:05" "2023-02-26 23:33:17" ...
## $ latitude : num 41.8 18.7 42.1 14.9 44.7 ...
## $ longitude : num 79.9 145.5 80 -104.6 146.5 ...
## $ depth : num 10 200 10 10 134 ...
## $ mag : num 5 4.8 4.9 4.6 4.5 5.1 4.6 5 4.5 5 ...
## $ magType : chr "mb" "mb" "mb" "mb" ...
## $ nst : int 46 67 45 51 108 31 31 44 37 32 ...
## $ gap : num 91 85 77 217 62 77 79 175 108 121 ...
## $ dmin : num 1.29 5.16 1.22 5.66 2.87 ...
## $ rms : num 0.8 0.95 0.82 0.57 0.82 0.94 0.56 0.85 0.6 0.68 ...
## $ net : chr "us" "us" "us" "us" ...
## $ id : chr "us6000jrt9" "us6000jrt6" "us6000jrss" "us6000jrsv" ...
## $ updated : POSIXct, format: "2023-02-27 00:11:38" "2023-02-26 23:58:32" ...
## $ place : chr "77 km NNW of Aksu, China" "Pagan region, Northern Mariana Islands" "" "northern East Pacific Rise" ...
## $ type : chr "earthquake" "earthquake" "earthquake" "earthquake" ...
## $ horizontalError: num 6.59 10.27 6.27 11.79 8.66 ...
## $ depthError : num 1.9 7.54 1.87 1.98 5.86 ...
## $ magError : num 0.078 0.043 0.086 0.038 0.059 0.089 0.084 0.03 0.101 0.116 ...
## $ magNst : int 52 165 42 205 84 41 42 343 29 24 ...
## $ status : chr "reviewed" "reviewed" "reviewed" "reviewed" ...
## $ locationSource : chr "us" "us" "us" "us" ...
## $ magSource : chr "us" "us" "us" "us" ...
## - attr(*, ".internal.selfref")=<externalptr>
We only want those earthquakes that occurred in or around Japan:
## Rows: 7,426
## Columns: 6
## $ time <dttm> 2023-02-25 13:27:42, 2023-02-25 10:25:44, 2023-02-21 18:05:…
## $ latitude <dbl> 42.7801, 27.1653, 27.1068, 42.7977, 26.4738, 32.6612, 23.223…
## $ longitude <dbl> 145.0736, 126.7673, 143.5371, 143.1312, 128.7992, 141.6609, …
## $ mag <dbl> 6.0, 5.1, 5.1, 5.0, 5.2, 5.8, 5.1, 5.2, 5.0, 6.3, 5.0, 5.2, …
## $ depth <dbl> 50.181, 10.000, 13.323, 130.444, 13.336, 11.000, 10.000, 44.…
## $ place <chr> "61 km ESE of Kushiro, Japan", "133 km NW of Ishikawa, Japan…
We have selected only the earthquakes that are of Magnitude greater or equal to 5 (M \(\geq 5\)). It is important to note that some earthquakes foreshocks and aftershocks will be left out (for example, those of magnitude 4.9). This will later have an effect on our conclusion. Now we will create a dummy set which we will use to manipulate or transform data, we will also create our training model and testing model. Our training model will be composed of the earthquakes that happened between 2010 - 2015 and our testing model will be composed of the earthquakes that happened between 2016 - 2021.
One important thing to notice here, is the amounts of observations between the two sets, one has almost one thousand more observations than the other. Why is that? We will take a look at it on the Exploratory and Data Analysis section.
We created a dummy set which is going to be used to manipulate data, add fields, and others. Note, the Regions variable was added.
## Classes 'data.table' and 'data.frame': 7426 obs. of 7 variables:
## $ time : POSIXct, format: "1906-01-21 13:49:34" "1906-02-24 00:13:41" ...
## $ latitude : num 34.2 34.6 32.6 36.7 35.8 ...
## $ longitude: num 138 141 132 142 137 ...
## $ mag : num 7.4 6.25 6.64 6.61 5.77 6.5 6.01 7.29 6.38 6.23 ...
## $ depth : num 300 15 30 35 15 25 35 15 20 35 ...
## $ place : chr "50 km SSW of ?yama, Japan" "116 km ESE of Katsuura, Japan" "36 km E of Nobeoka, Japan" "132 km ESE of Iwaki, Japan" ...
## $ Regions : Factor w/ 7 levels "East","North",..: 4 1 4 1 2 6 4 1 4 1 ...
## - attr(*, ".internal.selfref")=<externalptr>
Our training set contains all the quakes between 2010 - 2015.
## Rows: 1,473
## Columns: 7
## Index: time
## $ time <dttm> 2010-01-14 18:46:25, 2010-01-15 11:08:38, 2010-01-17 06:04:…
## $ latitude <dbl> 42.338, 26.746, 37.938, 30.964, 23.488, 40.332, 24.126, 23.7…
## $ longitude <dbl> 142.991, 126.285, 143.599, 130.888, 123.615, 143.651, 122.93…
## $ mag <dbl> 5.0, 5.7, 5.6, 5.5, 6.3, 5.3, 5.2, 5.5, 7.0, 5.2, 5.0, 5.0, …
## $ depth <dbl> 60.1, 139.1, 7.0, 38.2, 21.0, 15.8, 29.2, 37.2, 25.0, 35.0, …
## $ place <chr> "51 km E of Shizunai-furukawach?, Japan", "151 km WNW of Nah…
## $ Regions <fct> North East, South West, East, South, South West, North East,…
The testing set contains all the quakes between 2016 - 2021.
## Rows: 565
## Columns: 7
## Index: time
## $ time <dttm> 2016-01-05 02:21:11, 2016-01-05 21:59:50, 2016-01-09 14:12:…
## $ latitude <dbl> 30.6132, 22.0669, 27.9136, 44.4761, 41.9723, 43.1102, 39.710…
## $ longitude <dbl> 132.7337, 143.6108, 129.6042, 141.0867, 142.7810, 146.0168, …
## $ mag <dbl> 5.8, 5.7, 5.4, 6.2, 6.7, 5.1, 5.3, 5.7, 5.7, 5.1, 5.6, 5.1, …
## $ depth <dbl> 4.71, 158.00, 23.50, 238.81, 46.00, 38.44, 12.23, 196.00, 41…
## $ place <chr> "166 km E of Nishinoomote, Japan", "Volcano Islands, Japan r…
## $ Regions <fct> South, South East, South West, North East, North East, North…
One can already see a difference between the number of observations. Our training set has nearly 1,000 more quakes than the testing set.
We are looking to categorize our earthquakes by magnitude. We will use the EMS Intensities Synopsis (European Macroseismic Scale).
Based on the image above, we will now create our intensity variable. This variable will help us later to count the number of occurrences based on intensity, supervised learning, and to distinguish quakes in the map. Since determining a specific value is difficult, we will determine quakes by intensity.
## Rows: 7,426
## Columns: 10
## $ year <fct> 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, …
## $ date <date> 1906-01-21, 1906-02-24, 1906-03-13, 1906-04-08, 1906-04-20,…
## $ time <times> 13:49:34, 00:13:41, 13:26:41, 17:37:52, 19:39:41, 01:13:39…
## $ latitude <dbl> 34.175, 34.633, 32.560, 36.723, 35.810, 25.361, 34.093, 34.7…
## $ longitude <dbl> 138.025, 141.424, 132.052, 142.318, 137.070, 124.680, 134.89…
## $ mag <dbl> 7.40, 6.25, 6.64, 6.61, 5.77, 6.50, 6.01, 7.29, 6.38, 6.23, …
## $ intensity <fct> Damaging, Slightly Damaging, Slightly Damaging, Slightly Dam…
## $ depth <dbl> 300, 15, 30, 35, 15, 25, 35, 15, 20, 35, 35, 35, 20, 35, 35,…
## $ place <chr> "50 km SSW of ?yama, Japan", "116 km ESE of Katsuura, Japan"…
## $ Regions <fct> South, East, South, East, North, South West, South, East, So…
Our data set now has more fields. We separated our time variable into two variables (date and time), the intensity variable was also created, and the year variable – to count the number of quakes by year. To create a field that classifies earthquakes as foreshocks, aftershocks, or swarms, is such a daring task that one ought not to venture in it. One could easily misclassify a swarm of quakes as foreshocks. To create such a field, one will have to find the difference in time and the difference in magnitude. If the difference between time is longer than two (2) days it should not be classified as a foreshock or aftershock (personal opinion). If the difference in magnitude is not greater than \(0.3\) (M) and the difference in time is less than 2 days the quakes should be classified as swarms (again personal opinion) and if it’s \(> 0.3\) it should be classified as foreshock.
Four of the Fifteen of the world’s tectonic plates converge in Japan. A fifth of the world most powerful earthquakes occur in or around Japan. Based on previous information, we can conclude that there is a lot of seismic activity happening in Japan. We will take a look at the number of earthquakes that happened in Japan between 2000 - 2022 (a total of 3,144 quakes happened).
Earthquakes between 2000 - 2022 were selected. Upon observation, we can see an abnormal amount of earthquakes (with \(M \geq 5\)) occur in 2011. A total of 870 quakes, that will be the equivalent of experiencing 2 or 3 earthquakes a day. Upon further investigation, in this year Japan suffered the largest earthquake ever experienced in its history. On March 11, 2011 a 9.1M earthquake hit the Tohoku region, causing a massive tsunami. These events devastated the country of Japan: its infrastructures, the economy, and thousands of people died or lost their homes.
Also, one can notice an increase of quakes along the median line (106). In fact, between 2000 and 2010 a total of 1121 quakes happened and between 2012 and 2022 a total of 1153. Does this means that the total number of occurrences of quakes are increasing? Certainly not, but it is an important thing to notice.
## Total earthquakes by Intensity
## Strong 2856
## Slightly Damaging 259
## Damaging 27
## Heavily Damaging 1
## Destructive 1
In the space of two decades, Japan suffered one heavily damaging and one destructive earthquake.
Now, we will look at the locations of the earthquakes in Japan. We
will color them by intensity because we want to make a distinction
between them. Also,we want to have quakes with greater magnitude to be
bigger, therefore the size argument was
set to be the magnitude.
Most of the earthquakes happened in the East region of Japan. The most powerful ones occurred in this region. As the most affected area, we may expect that Japan takes some measures in the way they construct buildings around this area, the prices in insurances, and safety measures to prevent deaths or serious injuries. Based on this map we may infer that the zone more subtle (prone to) to more quakes – especially more powerful ones – is the East. Now, what measures should Japan take for this region?
## Total earthquakes by Intensity
## Strong 1338
## Slightly Damaging 124
## Damaging 10
## Heavily Damaging 0
## Destructive 1
As noted before, a destructive quake happened on 2011 and 37% of total damaging earthquakes between 2000 - 2022 happened during these years.
Earthquakes distribution between 2016 - 2021 was more random – but they still follow the same pattern; nevertheless, 2 of the 3 “Damaging” quakes happened in the East Also notice that no heavily damaging or destructive earthquake happened throughout these years. Why is there a concentration of earthquakes in this area?
## Total earthquakes by Intensity
## Strong 525
## Slightly Damaging 37
## Damaging 3
## Heavily Damaging 0
## Destructive 0
By viewing the maps, one may infer that the most affected area in Japan is the East, but is this true? Well, one must count the number of earthquakes that happen in this region. First, we will create a table to compare the total amounts of quakes that happened in the East against all the earthquakes that happened.
How many earthquakes occurred in the East region between 2010 - 2015 and 2016 - 2021?
## Total (2000 - 2022) East (2000 - 2022) Total (2010 - 2015) East (2010 - 2015)
## 1 3144 1505 1473 1001
## Total (2016 - 2021) East (2016 - 2021)
## 1 565 188
A total of 1,404 earthquakes happened in the East region (latitude: \(30^\circ - 40^\circ\), longitude: \(\geq 140^\circ\)). \(68\%\) of all the Earthquakes between 2010 - 2015 happened in this region. Between 2016 - 2021 \(33\%\) happened in this region. \(48\)% of all the earthquakes between 2000 - 2022 happened in the East and \(34\)% between 2000 - 2022 (excluding 2011). Hence, the East may be the region with most proneness.
First, these are not the same coordinates as the maps before. The area in cyan is another region of interest because we see a great quantity of earthquakes in this area, as well as the South West (Orange). But, are these really the most affected regions? We will look at the areas with the largest concentration using the k-medoids algorithm (clara).
## Cluster n latitude longitude
## 1 Orange 574 28.7941 140.5324
## 2 Green 1993 37.7265 141.7751
## 3 Blue 577 27.6887 128.5829
## [1] "Optimal number of clusters: Silhouette Method"
## 2 3 4 5 6 7 8
## 0.54 0.59 0.46 0.43 0.42 0.41 0.41
As expected, the k-medoids method gives us the regions with most concentration of earthquakes. Thus, is expected that Japan’s buildings infrastructure are well established around these zones. Notice that there is not much difference between the South West and the South East clusters.
We expect – since our data happens in space and time – that the number of occurrences by magnitude decreases as the magnitude increases. From the plot, we see that this is true – since our data behaves as a Poisson distribution.
Outliers are considered earthquakes with magnitude that are greater than expected.
It appears that earthquake above M \(> 6.2\) are a rare phenomenon. Again we emphasize, our data set contains earthquakes with M \(\geq 5\), this means that if our data set contained smaller quakes the whiskers will probably be smaller and its upper fence as well.
Since the data behaves as a Poisson distribution, we will calculate the probabilities that an earthquake of a specific intensity will happen in the following years or decades. First, let us show the Poisson density: \[ P(X = x) = \frac{\lambda^x}{x!} e^{-\lambda} \] \(\lambda\) is the average. We will treat \(x\) as years or decades. Our data is from 2000 - 2022; hence, we have 23 years of data collection. We will calculate the Poisson probability for the all the intensities.
## [1] "The probabilities that an earthquake of a certain intensity will occur in the following year are: "
## Strong Slightly Damaging Damaging Heavily Damaging
## 1.00000000 0.99998713 0.69084517 0.04254663
## Destructive
## 0.04254663
## [1] "The probabilities that an earthquake of a certain intensity will occur in the following decade are: "
## Strong Slightly Damaging Damaging Heavily Damaging
## 1.0000000 1.0000000 0.9999920 0.3525946
## Destructive
## 0.3525946
As shown, the heavily damaging and the destructive intensities have the same percentage, that is because our data is only from 2000 - 2022. If we had more observations (in terms of years) these percentages will probably change – it all depends on the amount of earthquakes by intensity. The probability of experiencing a “Damaging” earthquake is really high – we can expect this because of the huge amount of quakes that happened in 2011, but these probabilities can be lowered if we remove 2011 from our data.
As stated before, we our training model will be all the events that happened between 2010 to 2015. Let us first, train our model using the K-Nearest Neighbor classification. One thing to note, since the great majority of earthquakes are of category “Strong,” and if we only use the Latitude and Longitude to train our data, we can expect that all of them will be classify as “Strong,” because of this, we will use the Regions variable. It is well expected, because of the amount of data, that our models will perform perfectly when we trained them by magnitude.
Let us look see how well our models perform when looking for the intensity based on the depth of the earthquake. We will create a contingency table and a confusion matrix.
##
## Call:
## train.kknn(formula = intensity ~ depth, data = Japan_Earthquakes_2010_2015, kmax = 50)
##
## Type of response variable: nominal
## Minimal misclassification: 0.08825526
## Best kernel: optimal
## Best k: 10
According to our call, the best k (number of nearest neighbors) for our model will be 10. Now, let us create the knn model with k \(= 10\).
## KNN_model_intensity_depth
## Strong Slightly Damaging Damaging Heavily Damaging
## 564 1 0 0
## Destructive
## 0
Just by looking at the list it displays (and knowing before hand how earthquakes behaved during the 2016 - 2021) this model is poor, only giving us an earthquake with a different category. Let us the the contingency table and the number of asserted ones both for the training set and the testing set:
## Predictions_Based_on_Intensity_by_depth
## Strong Slightly Damaging Damaging Heavily Damaging
## 1455 18 0 0
## Destructive
## 0
## [1] "Contingency table for predicted earthquakes based on depth:"
## Predictions_Based_on_Intensity_by_depth
## Strong Slightly Damaging Damaging Heavily Damaging
## Strong 1332 6 0 0
## Slightly Damaging 113 11 0 0
## Damaging 10 0 0 0
## Heavily Damaging 0 0 0 0
## Destructive 0 1 0 0
## Predictions_Based_on_Intensity_by_depth
## Destructive
## Strong 0
## Slightly Damaging 0
## Damaging 0
## Heavily Damaging 0
## Destructive 0
## [1] "Number of asserted:"
## [1] 0.9117447
## [1] "Contingency table for our knn model and the testing set:"
## KNN_model_intensity_depth
## Strong Slightly Damaging Damaging Heavily Damaging
## Strong 524 1 0 0
## Slightly Damaging 37 0 0 0
## Damaging 3 0 0 0
## Heavily Damaging 0 0 0 0
## Destructive 0 0 0 0
## KNN_model_intensity_depth
## Destructive
## Strong 0
## Slightly Damaging 0
## Damaging 0
## Heavily Damaging 0
## Destructive 0
## [1] "Number of asserted:"
## [1] 0.9274336
We can see that our knn model has a 93% precision, which is good. On the other hand, the precision for the predicted earthquakes based on depth is 91% (9% error). Between 2010 - 2015 we had one destructive, 10 damaging, and 124 slightly damaging earthquakes. There is clearly a difference between real observations and predicted ones. One can wonder itself if (even) the model implementation is correct, or the analysis (interpretation) is right, seeing these dissimilarities (contrasts). Now, let us see the confusion matrix:
## [1] "Confusion matrix of intensity based on depth:"
## Confusion Matrix and Statistics
##
## Reference
## Prediction Strong Slightly Damaging Damaging Heavily Damaging
## Strong 1332 113 10 0
## Slightly Damaging 6 11 0 0
## Damaging 0 0 0 0
## Heavily Damaging 0 0 0 0
## Destructive 0 0 0 0
## Reference
## Prediction Destructive
## Strong 0
## Slightly Damaging 1
## Damaging 0
## Heavily Damaging 0
## Destructive 0
##
## Overall Statistics
##
## Accuracy : 0.9117
## 95% CI : (0.8961, 0.9257)
## No Information Rate : 0.9084
## P-Value [Acc > NIR] : 0.346
##
## Kappa : 0.1324
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Strong Class: Slightly Damaging Class: Damaging
## Sensitivity 0.99552 0.088710 0.000000
## Specificity 0.08889 0.994811 1.000000
## Pos Pred Value 0.91546 0.611111 NaN
## Neg Pred Value 0.66667 0.922337 0.993211
## Prevalence 0.90835 0.084182 0.006789
## Detection Rate 0.90428 0.007468 0.000000
## Detection Prevalence 0.98778 0.012220 0.000000
## Balanced Accuracy 0.54220 0.541760 0.500000
## Class: Heavily Damaging Class: Destructive
## Sensitivity NA 0.0000000
## Specificity 1 1.0000000
## Pos Pred Value NA NaN
## Neg Pred Value NA 0.9993211
## Prevalence 0 0.0006789
## Detection Rate 0 0.0000000
## Detection Prevalence 0 0.0000000
## Balanced Accuracy NA 0.5000000
Let us create a model that will predict the intensity based on the magnitude. As stated before, because of the amount of data, we can expect a perfect model.
##
## Call:
## train.kknn(formula = intensity ~ mag, data = Japan_Earthquakes_2010_2015, kmax = 50)
##
## Type of response variable: nominal
## Minimal misclassification: 0.0006788866
## Best kernel: optimal
## Best k: 1
The call tells one to choose k = 1.
## [1] "KNN model of intensity based on magnitude"
## KNN_model_intensity_mag
## Strong Slightly Damaging Damaging Heavily Damaging
## 525 37 3 0
## Destructive
## 0
If one compares these table with the 2016 - 2021 table we can see that they are the same. Essentially, classifying them by magnitude is a waste of time.
## [1] "Predictions of intensity based on magnitude"
## Predictions_Based_on_Intensity_by_mag
## Strong Slightly Damaging Damaging Heavily Damaging
## 1338 124 10 0
## Destructive
## 1
Same as the 2010 - 2015 table.
## [1] "Contingency table of intensity based on magnitude:"
## Predictions_Based_on_Intensity_by_mag
## Strong Slightly Damaging Damaging Heavily Damaging
## Strong 1338 0 0 0
## Slightly Damaging 0 124 0 0
## Damaging 0 0 10 0
## Heavily Damaging 0 0 0 0
## Destructive 0 0 0 0
## Predictions_Based_on_Intensity_by_mag
## Destructive
## Strong 0
## Slightly Damaging 0
## Damaging 0
## Heavily Damaging 0
## Destructive 1
## [1] "asserted:"
## [1] 1
## [1] "Contingency table of intensity based on magnitude for 2016 - 2021:"
##
## KNN_model_intensity_mag Strong Slightly Damaging Damaging Heavily Damaging
## Strong 525 0 0 0
## Slightly Damaging 0 37 0 0
## Damaging 0 0 3 0
## Heavily Damaging 0 0 0 0
## Destructive 0 0 0 0
##
## KNN_model_intensity_mag Destructive
## Strong 0
## Slightly Damaging 0
## Damaging 0
## Heavily Damaging 0
## Destructive 0
## [1] "asserted:"
## [1] 0.9274336
## [1] "Confusion matrix of intensity based on mag:"
## Confusion Matrix and Statistics
##
## Reference
## Prediction Strong Slightly Damaging Damaging Heavily Damaging
## Strong 1338 0 0 0
## Slightly Damaging 0 124 0 0
## Damaging 0 0 10 0
## Heavily Damaging 0 0 0 0
## Destructive 0 0 0 0
## Reference
## Prediction Destructive
## Strong 0
## Slightly Damaging 0
## Damaging 0
## Heavily Damaging 0
## Destructive 1
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9975, 1)
## No Information Rate : 0.9084
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Strong Class: Slightly Damaging Class: Damaging
## Sensitivity 1.0000 1.00000 1.000000
## Specificity 1.0000 1.00000 1.000000
## Pos Pred Value 1.0000 1.00000 1.000000
## Neg Pred Value 1.0000 1.00000 1.000000
## Prevalence 0.9084 0.08418 0.006789
## Detection Rate 0.9084 0.08418 0.006789
## Detection Prevalence 0.9084 0.08418 0.006789
## Balanced Accuracy 1.0000 1.00000 1.000000
## Class: Heavily Damaging Class: Destructive
## Sensitivity NA 1.0000000
## Specificity 1 1.0000000
## Pos Pred Value NA 1.0000000
## Neg Pred Value NA 1.0000000
## Prevalence 0 0.0006789
## Detection Rate 0 0.0006789
## Detection Prevalence 0 0.0006789
## Balanced Accuracy NA 1.0000000
Now, this is the part in which we will answer the question: at bigger depth the more powerful the quake? Let us first create a graph and compare earthquakes by magnitude and depth. We can also make a map, size them by depth and color them by intensity.
Depth vs Magnitude/Intensity plot:
Based on previous graph we can see that most earthquakes happened at low depths. One cannot, by viewing this plot, think that the bigger the depth the stronger the quake. First, notice that the biggest earthquake (9.1) occurred at a low depth, while a 7.8M occurred at a 664 depth. One can also notice that smaller magnitude quakes also happened at a low depth. Hence, is depth an indicator of the type of magnitude of the quake? Based on previous information one may say no.
##
## Call:
## train.kknn(formula = mag ~ depth, data = Japan_Earthquakes_2010_2015, kmax = 50)
##
## Type of response variable: continuous
## minimal mean absolute error: 0.2835633
## Minimal mean squared error: 0.1615699
## Best kernel: optimal
## Best k: 8
Our call gives us to select k \(= 8\). Let us see what our model tells us about the magnitude of the quake based on the depth”:
## [1] "KNN model of magnitude based on depth:"
## KNN_model_mag
## 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
## 303 74 63 30 24 23 17 5 8 3 4 8 1 1 0 1 0 0 0 0
## 7 7.1 7.3 7.4 7.7 7.8 7.9 9.1
## 0 0 0 0 0 0 0 0
The table shows that there were no “Damaging” earthquakes between 2016 - 2021, but we know this is not true. The tables will not be presented, only the precision.
## [1] "asserted for 2010 - 2015:"
## [1] 0.00203666
## [1] "asserted for 2016 - 2021:"
## [1] 0.2070796
Our precision value is really low, not even reaching 50%, making these a failed model. To test if location, depth, and time are good predictor for our model, we will now create a linear model with these predictors as parameters.
##
## Call:
## train.kknn(formula = intensity ~ Regions + depth, data = Japan_Earthquakes_2010_2015, kmax = 50)
##
## Type of response variable: nominal
## Minimal misclassification: 0.08961303
## Best kernel: optimal
## Best k: 6
## [1] "asserted for 2010 - 2015:"
## [1] 0.9239647
The model has a 92% accuracy. This model predicted the intensity based on the region and depth.
##
## Call:
## train.kknn(formula = Regions ~ intensity + depth, data = Japan_Earthquakes_2010_2015, kmax = 50)
##
## Type of response variable: nominal
## Minimal misclassification: 0.299389
## Best kernel: optimal
## Best k: 18
## [1] "asserted for 2010 - 2015:"
## [1] 0.712831
When it comes to determining the region based on the intensity and depth, the model gave us a 71% accuracy. Why the accuracy in this model is lower than the model before? One may suggest that the reason is because most quakes are of category strong, hence the model only has to categorize a few as strong and by doing so you may obtain a higher percentage.
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## Strong Slightly Damaging Damaging Heavily Damaging
## 0.9083969466 0.0823791349 0.0085877863 0.0003180662
## Destructive
## 0.0003180662
##
## Conditional probabilities:
## Regions
## Y East North North East South South East
## Strong 0.477591036 0.036764706 0.124299720 0.126400560 0.106092437
## Slightly Damaging 0.482625483 0.057915058 0.135135135 0.138996139 0.050193050
## Damaging 0.555555556 0.000000000 0.074074074 0.185185185 0.148148148
## Heavily Damaging 0.000000000 0.000000000 1.000000000 0.000000000 0.000000000
## Destructive 1.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## Regions
## Y South West West
## Strong 0.126050420 0.002801120
## Slightly Damaging 0.127413127 0.007722008
## Damaging 0.037037037 0.000000000
## Heavily Damaging 0.000000000 0.000000000
## Destructive 0.000000000 0.000000000
##
## depth
## Y [,1] [,2]
## Strong 49.81767 78.43948
## Slightly Damaging 66.03656 112.95927
## Damaging 88.20296 149.24884
## Heavily Damaging 27.00000 NA
## Destructive 29.00000 NA
Here one can see that the most probabilities of an earthquake of certain intensity are higher in the East region. Below, a table will be show showing the amount of quakes by region. Notice that no earthquake occurred in the North West region.
##
## East North North East South South East South West West
## 1505 120 393 402 320 394 10
Our Bayes classifier asserted \(48\%\) of the intensity of the quakes based on the region and depth. Compared to the KNN method this algorithm gives a really low accuracy.
To reiterate, Japan is located in the Pacific Ring of Fire, an area know for its recurrent seismic activities. Throughout the years Japan has been hit with man damaging earthquakes, including one of magnitude 9.1. Japan’s region that is subtle to most earthquakes is the East. Some Supervised Learning algorithms gave us high accuracy, some did not.
Can you predict earthquakes? United States Geological Survey. (n.d.). Retrieved April 26, 2023, from https://www.usgs.gov/faqs/can-you-predict-earthquakes
Earthquakes. Earthquakes | U.S. Geological Survey. (n.d.). Retrieved April 26, 2023, from https://www.usgs.gov/programs/earthquake-hazards/earthquakes
Earthquakes: Prediction, forecasting and Mitigation. The Geological Society of London. (n.d.). Retrieved April 26, 2023, from https://www.geolsoc.org.uk/earthquake-briefing
Matthew Gerstenberger, D. N. (2023, February 13). Nobody can predict earthquakes, but we can forecast them. here’s how. PreventionWeb. Retrieved April 26, 2023, from https://www.preventionweb.net/news/nobody-can-predict-earthquakes-we-can-forecast-them-heres-how
O’Malley, A. (2023, January 25). Construction expertise from Japan: Earthquake proof buildings. PlanRadar. Retrieved May 1, 2023, from https://www.planradar.com/gb/japan-earthquake-proof-buildings/
Wikimedia Foundation. (2023, April 18). Earthquake. Wikipedia. Retrieved April 26, 2023, from https://en.wikipedia.org/wiki/Earthquake