This machine learning project focuses on a critical challenge: detecting hazardous asteroids. The primary goal of this endeavor is to construct a robust predictive model with the capability to identify asteroids that pose a threat to Earth. By accurately pinpointing potential hazards, the model aims to enhance decision-making in planetary defense strategies, mitigate potential risks, and facilitate informed actions in the realm of planetary safety.
These are some valuable advantages that are offered by the result of this project:
Early Alerts:Timely warnings for potential asteroid threats.Impact Reduction:Minimized risk from hazardous asteroids.Informed Decisions:Data-driven choices for emergency responses.Planetary Safety:Improved defense against celestial hazards.Scientific Insights:Enhanced understanding of asteroid behavior.Public Awareness:Raised awareness and educational opportunities.Global Collaboration:International partnerships for defense efforts.Mission Planning:Optimized space mission targeting.Policy Influence:Guidance for planetary defense policies.Inspiration:Catalyst for innovation and space exploration.
The data is from NASA API from http://neo.jpl.nasa.gov/. This API is maintained by
SpaceRocks Team: David Greenfield, Arezu Sarvestani, Jason English and
Peter Baunach.
## Rows: 4,687
## Columns: 40
## $ Neo.Reference.ID <int> 3703080, 3723955, 2446862, 3092506, 35147…
## $ Name <int> 3703080, 3723955, 2446862, 3092506, 35147…
## $ Absolute.Magnitude <dbl> 21.6, 21.3, 20.3, 27.4, 21.6, 19.6, 19.6,…
## $ Est.Dia.in.KM.min. <dbl> 0.127219878, 0.146067964, 0.231502122, 0.…
## $ Est.Dia.in.KM.max. <dbl> 0.28447230, 0.32661790, 0.51765448, 0.019…
## $ Est.Dia.in.M.min. <dbl> 127.219879, 146.067964, 231.502122, 8.801…
## $ Est.Dia.in.M.max. <dbl> 284.47230, 326.61790, 517.65448, 19.68067…
## $ Est.Dia.in.Miles.min. <dbl> 0.079050743, 0.090762397, 0.143848705, 0.…
## $ Est.Dia.in.Miles.max. <dbl> 0.17676284, 0.20295089, 0.32165548, 0.012…
## $ Est.Dia.in.Feet.min. <dbl> 417.38807, 479.22562, 759.52142, 28.87620…
## $ Est.Dia.in.Feet.max. <dbl> 933.30809, 1071.58106, 1698.34153, 64.569…
## $ Close.Approach.Date <fct> 1995-01-01, 1995-01-01, 1995-01-08, 1995-…
## $ Epoch.Date.Close.Approach <dbl> 788947200000, 788947200000, 789552000000,…
## $ Relative.Velocity.km.per.sec <dbl> 6.115834, 18.113985, 7.590711, 11.173874,…
## $ Relative.Velocity.km.per.hr <dbl> 22017.004, 65210.346, 27326.560, 40225.94…
## $ Miles.per.hour <dbl> 13680.510, 40519.173, 16979.662, 24994.84…
## $ Miss.Dist..Astronomical. <dbl> 0.41948253, 0.38301446, 0.05095602, 0.285…
## $ Miss.Dist..lunar. <dbl> 163.17871, 148.99263, 19.82189, 110.99039…
## $ Miss.Dist..kilometers. <dbl> 62753692, 57298148, 7622912, 42683616, 61…
## $ Miss.Dist..miles. <dbl> 38993336, 35603420, 4736658, 26522368, 37…
## $ Orbiting.Body <fct> Earth, Earth, Earth, Earth, Earth, Earth,…
## $ Orbit.ID <int> 17, 21, 22, 7, 25, 40, 43, 22, 100, 30, 1…
## $ Orbit.Determination.Date <fct> 2017-04-06 08:36:37, 2017-04-06 08:32:49,…
## $ Orbit.Uncertainity <int> 5, 3, 0, 6, 1, 1, 1, 0, 0, 0, 6, 4, 6, 0,…
## $ Minimum.Orbit.Intersection <dbl> 0.025281900, 0.186935000, 0.043057900, 0.…
## $ Jupiter.Tisserand.Invariant <dbl> 4.634, 5.457, 4.557, 5.093, 5.154, 4.724,…
## $ Epoch.Osculation <dbl> 2458000, 2458000, 2458000, 2458000, 24580…
## $ Eccentricity <dbl> 0.4255491, 0.3516743, 0.3482483, 0.216578…
## $ Semi.Major.Axis <dbl> 1.4070113, 1.1077760, 1.4588238, 1.255902…
## $ Inclination <dbl> 6.025981, 28.412996, 4.237961, 7.905894, …
## $ Asc.Node.Longitude <dbl> 314.373913, 136.717242, 259.475979, 57.17…
## $ Orbital.Period <dbl> 609.5998, 425.8693, 643.5802, 514.0821, 4…
## $ Perihelion.Distance <dbl> 0.8082589, 0.7181996, 0.9507910, 0.983901…
## $ Perihelion.Arg <dbl> 57.257470, 313.091975, 248.415038, 18.707…
## $ Aphelion.Dist <dbl> 2.005764, 1.497352, 1.966857, 1.527904, 1…
## $ Perihelion.Time <dbl> 2458162, 2457795, 2458120, 2457902, 24578…
## $ Mean.Anomaly <dbl> 264.837533, 173.741112, 292.893654, 68.74…
## $ Mean.Motion <dbl> 0.5905514, 0.8453298, 0.5593708, 0.700277…
## $ Equinox <fct> J2000, J2000, J2000, J2000, J2000, J2000,…
## $ Hazardous <fct> True, False, True, False, True, False, Fa…
We need to do two things:
- Restructure target variable (
Hazardous)- Remove uneeded columns
asteroids$Hazardous <- ifelse(asteroids$Hazardous == "True", 1, 0) %>% as.factor()
asteroids %>% head(3)asteroids <- asteroids %>%
select(-Close.Approach.Date,
-Orbit.Determination.Date,
-Neo.Reference.ID,
-Name,
-Orbit.ID)
asteroids %>% glimpse()## Rows: 4,687
## Columns: 35
## $ Absolute.Magnitude <dbl> 21.6, 21.3, 20.3, 27.4, 21.6, 19.6, 19.6,…
## $ Est.Dia.in.KM.min. <dbl> 0.127219878, 0.146067964, 0.231502122, 0.…
## $ Est.Dia.in.KM.max. <dbl> 0.28447230, 0.32661790, 0.51765448, 0.019…
## $ Est.Dia.in.M.min. <dbl> 127.219879, 146.067964, 231.502122, 8.801…
## $ Est.Dia.in.M.max. <dbl> 284.47230, 326.61790, 517.65448, 19.68067…
## $ Est.Dia.in.Miles.min. <dbl> 0.079050743, 0.090762397, 0.143848705, 0.…
## $ Est.Dia.in.Miles.max. <dbl> 0.17676284, 0.20295089, 0.32165548, 0.012…
## $ Est.Dia.in.Feet.min. <dbl> 417.38807, 479.22562, 759.52142, 28.87620…
## $ Est.Dia.in.Feet.max. <dbl> 933.30809, 1071.58106, 1698.34153, 64.569…
## $ Epoch.Date.Close.Approach <dbl> 788947200000, 788947200000, 789552000000,…
## $ Relative.Velocity.km.per.sec <dbl> 6.115834, 18.113985, 7.590711, 11.173874,…
## $ Relative.Velocity.km.per.hr <dbl> 22017.004, 65210.346, 27326.560, 40225.94…
## $ Miles.per.hour <dbl> 13680.510, 40519.173, 16979.662, 24994.84…
## $ Miss.Dist..Astronomical. <dbl> 0.41948253, 0.38301446, 0.05095602, 0.285…
## $ Miss.Dist..lunar. <dbl> 163.17871, 148.99263, 19.82189, 110.99039…
## $ Miss.Dist..kilometers. <dbl> 62753692, 57298148, 7622912, 42683616, 61…
## $ Miss.Dist..miles. <dbl> 38993336, 35603420, 4736658, 26522368, 37…
## $ Orbiting.Body <fct> Earth, Earth, Earth, Earth, Earth, Earth,…
## $ Orbit.Uncertainity <int> 5, 3, 0, 6, 1, 1, 1, 0, 0, 0, 6, 4, 6, 0,…
## $ Minimum.Orbit.Intersection <dbl> 0.025281900, 0.186935000, 0.043057900, 0.…
## $ Jupiter.Tisserand.Invariant <dbl> 4.634, 5.457, 4.557, 5.093, 5.154, 4.724,…
## $ Epoch.Osculation <dbl> 2458000, 2458000, 2458000, 2458000, 24580…
## $ Eccentricity <dbl> 0.4255491, 0.3516743, 0.3482483, 0.216578…
## $ Semi.Major.Axis <dbl> 1.4070113, 1.1077760, 1.4588238, 1.255902…
## $ Inclination <dbl> 6.025981, 28.412996, 4.237961, 7.905894, …
## $ Asc.Node.Longitude <dbl> 314.373913, 136.717242, 259.475979, 57.17…
## $ Orbital.Period <dbl> 609.5998, 425.8693, 643.5802, 514.0821, 4…
## $ Perihelion.Distance <dbl> 0.8082589, 0.7181996, 0.9507910, 0.983901…
## $ Perihelion.Arg <dbl> 57.257470, 313.091975, 248.415038, 18.707…
## $ Aphelion.Dist <dbl> 2.005764, 1.497352, 1.966857, 1.527904, 1…
## $ Perihelion.Time <dbl> 2458162, 2457795, 2458120, 2457902, 24578…
## $ Mean.Anomaly <dbl> 264.837533, 173.741112, 292.893654, 68.74…
## $ Mean.Motion <dbl> 0.5905514, 0.8453298, 0.5593708, 0.700277…
## $ Equinox <fct> J2000, J2000, J2000, J2000, J2000, J2000,…
## $ Hazardous <fct> 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
The target proportion is imbalanced. Let’s upsample!
up_asteroids <- upSample(x = asteroids %>% select(-Hazardous),
y = asteroids$Hazardous,
yname = "Hazardous")
up_asteroids$Hazardous %>% table() %>% barplot()Cool. It’s balanced now!
The column
EquinoxandOrbiting.Bodyhas 0 variance. Let’s remove them!
## Absolute.Magnitude Est.Dia.in.KM.min. Est.Dia.in.KM.max. Est.Dia.in.M.min.
## Min. :11.16 Min. : 0.001011 Min. : 0.00226 Min. : 1.011
## 1st Qu.:19.60 1st Qu.: 0.076658 1st Qu.: 0.17141 1st Qu.: 76.658
## Median :21.00 Median : 0.167708 Median : 0.37501 Median : 167.708
## Mean :21.40 Mean : 0.249889 Mean : 0.55877 Mean : 249.889
## 3rd Qu.:22.70 3rd Qu.: 0.319562 3rd Qu.: 0.71456 3rd Qu.: 319.562
## Max. :32.10 Max. :15.579552 Max. :34.83694 Max. :15579.552
## Est.Dia.in.M.max. Est.Dia.in.Miles.min. Est.Dia.in.Miles.max.
## Min. : 2.26 Min. :0.000628 Min. : 0.001404
## 1st Qu.: 171.41 1st Qu.:0.047633 1st Qu.: 0.106510
## Median : 375.01 Median :0.104209 Median : 0.233019
## Mean : 558.77 Mean :0.155274 Mean : 0.347203
## 3rd Qu.: 714.56 3rd Qu.:0.198566 3rd Qu.: 0.444008
## Max. :34836.94 Max. :9.680682 Max. :21.646663
## Est.Dia.in.Feet.min. Est.Dia.in.Feet.max. Epoch.Date.Close.Approach
## Min. : 3.32 Min. : 7.41 Min. :7.889e+11
## 1st Qu.: 251.50 1st Qu.: 562.37 1st Qu.:9.905e+11
## Median : 550.22 Median : 1230.34 Median :1.182e+12
## Mean : 819.85 Mean : 1833.23 Mean :1.164e+12
## 3rd Qu.: 1048.43 3rd Qu.: 2344.36 3rd Qu.:1.344e+12
## Max. :51114.02 Max. :114294.42 Max. :1.473e+12
## Relative.Velocity.km.per.sec Relative.Velocity.km.per.hr Miles.per.hour
## Min. : 0.3355 Min. : 1208 Min. : 750.5
## 1st Qu.: 9.6670 1st Qu.: 34801 1st Qu.:21624.2
## Median :14.2036 Median : 51133 Median :31772.0
## Mean :15.2773 Mean : 54998 Mean :34173.7
## 3rd Qu.:19.5768 3rd Qu.: 70476 3rd Qu.:43791.3
## Max. :44.6337 Max. :160681 Max. :99841.2
## Miss.Dist..Astronomical. Miss.Dist..lunar. Miss.Dist..kilometers.
## Min. :0.0001779 Min. : 0.06919 Min. : 26610
## 1st Qu.:0.1391497 1st Qu.: 54.12924 1st Qu.:20816502
## Median :0.2732463 Median :106.29281 Median :40877064
## Mean :0.2626487 Mean :102.17036 Mean :39291694
## 3rd Qu.:0.3901572 3rd Qu.:151.77116 3rd Qu.:58366688
## Max. :0.4998841 Max. :194.45491 Max. :74781600
## Miss.Dist..miles. Orbit.Uncertainity Minimum.Orbit.Intersection
## Min. : 16535 Min. :0.000 Min. :0.0000021
## 1st Qu.:12934775 1st Qu.:0.000 1st Qu.:0.0127533
## Median :25399830 Median :1.000 Median :0.0296974
## Mean :24414727 Mean :2.578 Mean :0.0582420
## 3rd Qu.:36267380 3rd Qu.:6.000 3rd Qu.:0.0642087
## Max. :46467132 Max. :9.000 Max. :0.4778910
## Jupiter.Tisserand.Invariant Epoch.Osculation Eccentricity
## Min. :2.196 Min. :2450164 Min. :0.007522
## 1st Qu.:4.069 1st Qu.:2458000 1st Qu.:0.269198
## Median :5.085 Median :2458000 Median :0.408658
## Mean :5.053 Mean :2457756 Mean :0.413052
## 3rd Qu.:5.990 3rd Qu.:2458000 3rd Qu.:0.546893
## Max. :9.025 Max. :2458020 Max. :0.960261
## Semi.Major.Axis Inclination Asc.Node.Longitude Orbital.Period
## Min. :0.6159 Min. : 0.01451 Min. : 0.0019 Min. : 176.6
## 1st Qu.:1.0061 1st Qu.: 4.85172 1st Qu.: 84.3511 1st Qu.: 368.6
## Median :1.2332 Median : 9.97538 Median :174.1097 Median : 500.2
## Mean :1.3925 Mean :13.46782 Mean :174.3131 Mean : 629.7
## 3rd Qu.:1.6554 3rd Qu.:19.80588 3rd Qu.:259.4328 3rd Qu.: 777.9
## Max. :5.0720 Max. :75.40667 Max. :359.9059 Max. :4172.2
## Perihelion.Distance Perihelion.Arg Aphelion.Dist Perihelion.Time
## Min. :0.08074 Min. : 0.0069 Min. :0.8038 Min. :2450100
## 1st Qu.:0.58926 1st Qu.: 97.4822 1st Qu.:1.3002 1st Qu.:2457826
## Median :0.78809 Median :193.9037 Median :1.6712 Median :2457983
## Mean :0.76705 Mean :184.6180 Mean :2.0180 Mean :2457758
## 3rd Qu.:0.95089 3rd Qu.:268.1775 3rd Qu.:2.4768 3rd Qu.:2458108
## Max. :1.29983 Max. :359.9931 Max. :8.9839 Max. :2458839
## Mean.Anomaly Mean.Motion
## Min. : 0.0032 Min. :0.08628
## 1st Qu.: 96.2437 1st Qu.:0.46277
## Median :191.2148 Median :0.71971
## Mean :187.2033 Mean :0.74210
## 3rd Qu.:281.9666 3rd Qu.:0.97666
## Max. :359.9180 Max. :2.03900
All numerical columns are distributed normally. Let’s move on to cross validation!
Because we don’t have a lot of records, we will set the training proportion to 85%
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 549 35
## 1 40 554
##
## Accuracy : 0.9363
## 95% CI : (0.9208, 0.9496)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8727
##
## Mcnemar's Test P-Value : 0.6442
##
## Sensitivity : 0.9321
## Specificity : 0.9406
## Pos Pred Value : 0.9401
## Neg Pred Value : 0.9327
## Prevalence : 0.5000
## Detection Rate : 0.4660
## Detection Prevalence : 0.4958
## Balanced Accuracy : 0.9363
##
## 'Positive' Class : 0
##
The focused metric for our evaluation is
Accuracy. Why? Because in the realm of hazardous asteroid detections, it’s crucial to ensure a reliable overall performance in identifying both hazardous and non-hazardous instances.The Naive Bayes model has demonstrated an overall
Accuracy of 92%, accompanied by aSensitivity of 93%andSpecificity of 92%. The high Sensitivity indicates the model’s effectiveness in correctly identifying hazardous asteroids, a critical capability for early detection and accurate assessment of potential threats. Furthermore, the impressive Specificity underscores the model’s ability to accurately classify non-hazardous asteroids. These results collectively contribute to our ability to proactively mitigate the risk posed by hazardous celestial bodies and enhance our planetary safety measures.
NOTE:Due to the unpredictable nature of R Markdown rendering, the generated scores might differ when the document is converted to HTML
pred_lgr <- ifelse(predict(model_lgr, X_test) > 0.5, 1, 0) %>% as.factor()
confusionMatrix(pred_lgr, y_test)## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 558 37
## 1 31 552
##
## Accuracy : 0.9423
## 95% CI : (0.9274, 0.9549)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8846
##
## Mcnemar's Test P-Value : 0.5443
##
## Sensitivity : 0.9474
## Specificity : 0.9372
## Pos Pred Value : 0.9378
## Neg Pred Value : 0.9468
## Prevalence : 0.5000
## Detection Rate : 0.4737
## Detection Prevalence : 0.5051
## Balanced Accuracy : 0.9423
##
## 'Positive' Class : 0
##
The logistic regression model exhibited an impressive
Accuracy of 93%, alongside aSensitivity of 95%andSpecificity of 92%. The high Sensitivity demonstrates the model’s effectiveness in correctly flagging hazardous asteroids, which is of paramount importance for early detection and mitigation efforts, contributing to enhanced planetary safety.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 589 0
## 1 0 589
##
## Accuracy : 1
## 95% CI : (0.9969, 1)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0
## Specificity : 1.0
## Pos Pred Value : 1.0
## Neg Pred Value : 1.0
## Prevalence : 0.5
## Detection Rate : 0.5
## Detection Prevalence : 0.5
## Balanced Accuracy : 1.0
##
## 'Positive' Class : 0
##
The random forest model demonstrated remarkable performance, achieving an exceptional
Accuracy of 100%, making it thewinner model of this project!. Furthermore, the model achieved aSensitivity of 100%andSpecificity of 100%. This outstanding performance reflects the model’s ability to flawlessly distinguish hazardous asteroids from non-hazardous ones. Such high sensitivity and specificity are instrumental in identifying and classifying objects accurately, underscoring the model’s reliability in predicting hazardous asteroids. This level of precision is of paramount importance for effective early detection and risk assessment, contributing to enhanced planetary safety and preparedness.
My way of explaining the AUC of ROC score is that it reflects the level of certainty our model has in its predictions. A high AUC of ROC score indicates that the model is very confident and sure of its predictions. As people who use the model, we want the model to be highly confident in its predictions since we rely on it for making decisions. We don’t want a model that isn’t sure or confident about its predictions. This is why the AUC or ROC score is a crucial measure to determine if the model is prepared for practical use or not.
pred_rf_raw <- predict(model_rf, X_test, type = "prob")
plotROC(pred_rf_raw[, 2], y_test) # This function is from my personal packageMagnificent. Why? Because the closer the AUC to 1 the more confident the model is at detecting which asteroids are hazardous and which are not
In conclusion, the final model developed for hazardous asteroid detection, which is the random forest model, stands as an exceptional achievement in predictive modeling. With an extraordinary Accuracy of 100% and Sensitivity of 100%, the model showcases unparalleled precision in identifying hazardous asteroids. This model also demonstrates perfect Specificity of 100%, further highlighting its ability to accurately differentiate non-hazardous asteroids.
Moreover, the AUC of ROC score of 1.0, representing a perfect score, serves as a resounding validation of the model’s readiness for practical deployment. This impeccable AUC score underscores the model’s absolute confidence in distinguishing between hazardous and non-hazardous asteroids. The model’s remarkable performance across all evaluation metrics solidifies its position as a reliable and robust tool for effectively detecting potential asteroid threats.
The flawless performance achieved by the random forest model, marked by its perfect accuracy, sensitivity, and specificity, attests to its exceptional competence in the realm of hazardous asteroid detection. With its unprecedented performance metrics, the model stands as a testament to its preparedness for real-world application, offering invaluable contributions to planetary defense and safety.