Meteorites

Author

Robert Gravatt

A Leonid meteor during the peak of the Leonids in 2009. “Leonids Meteor Shower.” Wikimedia Commons. Accessed 15 Dec. 2025. https://en.wikipedia.org/wiki/Leonids.

Photograph from NASA : Meteors and Meteorites https://science.nasa.gov/solar-system/meteors-meteorites/

Meteor vs. Meteorite

Meteor: The streak of light caused by a meteoroid entering the atmosphere.

Meteorite: The fragment that survives the journey through the atmosphere and lands on Earth.

Introduction

Meteorites are fragments of rock or metal from space that survive their passage through Earth’s atmosphere and land on the surface. The dataset for this project comes from the Meteoritical Bulletin Database, compiled and maintained by the Meteoritical Society [2] and made available through NASA’s website [1]. It includes information such as meteorite name, identification number, classification (recclass), year of fall or discovery, fall type (observed vs. found later), and geographic coordinates. These variables span both quantitative (mass, year, coordinates) and categorical (type, classification) dimensions, making the dataset well-suited for statistical analysis and visualization.

The data were collected through a combination of field reports, laboratory classification, and international contributions coordinated by the Meteoritical Society [2]. While the dataset does not include a detailed methodology file, the Bulletin itself serves as the authoritative record of meteorite discoveries. In total, the database documents over 45,000 meteorites that either landed or were discovered between the years 860 and 2013 [2]. Unfortunately, more recent meteorites are not yet included, as neither the Meteoritical Society nor anyone else has constructed a comprehensive database beyond 2013 even though the data is available.

I chose this dataset because meteorites connect astronomy, geology, and history. They provide insight into the composition of the solar system and hold cultural significance in how societies record unusual celestial events [3]. For me, this project is meaningful because it combines scientific rigor with a dataset that is global in scope and rich in both categorical and quantitative detail.

Background Research

Meteorites provide critical evidence about the early solar system, planetary formation, and the composition of celestial bodies. Most observed falls are chondrites, which make up roughly 93% of meteorite falls, while irons account for only about 6% but are disproportionately recovered due to their durability [3]. Iron meteorites are thought to originate from the cores of differentiated planetesimals, giving them unique compositional and structural properties that distinguish them from stony meteorites [4]. Studies of fall rates and periodicity further suggest that meteorite impacts follow long-term cycles, reinforcing the importance of statistical modeling in understanding observed distributions [5]. This background helps explain why the statistical analysis in this project consistently identified Iron meteorites as the most predictive of observed falls.

Load the Data Set

library(tidyverse)

meteorites <- read_csv("Meteorite_Landings.csv")

Data Cleaning and Wrangling

To prepare the meteorite dataset for analysis, I first standardized the column names and variable types. The original dataset recorded meteorite mass under the column mass (g), which I renamed to mass_g for easier reference in R. I then converted the year variable to numeric, and recoded both fall (whether the meteorite was observed falling or later found) and recclass (meteorite classification) as factors, since they are categorical variables.

Next, I applied filters to ensure the dataset contained only valid and meaningful records. I removed entries with missing values for mass, year, latitude, longitude, or fall type, since these variables are essential for the analyses and visualizations. I restricted the year range to between 860 and 2013, which aligns with the known coverage of the Meteoritical Bulletin Database (meteorites recorded between ~860 and 2013, with a buffer to exclude erroneous entries). Latitude and longitude were constrained to valid ranges (–90 to 90 for latitude, –180 to 180 for longitude), and I excluded meteorites with non‑positive mass values.

These steps ensured that the dataset was consistent, clean, and ready for statistical modeling and visualization. By carefully filtering and recoding variables rather than simply dropping rows indiscriminately, I preserved the integrity of the dataset while removing problematic entries that could bias the results.

# Clean column names and restrict years
meteorites_clean <- meteorites |>
  rename(mass_g = `mass (g)`) |>
  mutate(
    year = as.numeric(year),
    fall = as.factor(fall),
    recclass = as.factor(recclass)
  ) |>
  filter(!is.na(mass_g),
         !is.na(year),
         !is.na(reclat),
         !is.na(reclong),
         !is.na(fall),
         year >= 860, year <= 2013,
         between(reclat, -90, 90),
         between(reclong, -180, 180),
         mass_g > 0)

Statistical Analysis

Logistic Regression with 4 Classification Groups

To examine whether meteorite classification and size predict fall type, I grouped the wide variety of recclass values into four broad categories: Iron meteorites, Stony meteorites (including Howardites, Eucrites, and Diogenites), Chondrites, and Other. These groupings simplify the dataset into interpretable categories while preserving the major scientific distinctions. I then fit a logistic regression model with fall type (Fell vs. Found) as the binary outcome, using both classification group and log‑transformed mass as predictors. This approach tests whether meteorite type and size influence the likelihood of a meteorite being observed falling rather than later discovered.

meteorites_clean <- meteorites_clean |>
  mutate(
    recclass_grouped = case_when(
      str_detect(recclass, regex("Iron", ignore_case = TRUE)) ~ "Iron",
      str_detect(recclass, regex("HED|Eucrite|Diogenite|Howardite", ignore_case = TRUE)) ~ "Stony",
      str_detect(recclass, regex("Chondrite|LL|L |H ", ignore_case = TRUE)) ~ "Chondrite",
      TRUE ~ "Other"
    ),
    recclass_grouped = as.factor(recclass_grouped)
  )


logistic_model <- meteorites_clean |>
  glm(fall ~ recclass_grouped + log10(mass_g), data = _, family = binomial)

summary(logistic_model)


Call:
glm(formula = fall ~ recclass_grouped + log10(mass_g), family = binomial, 
    data = meteorites_clean)

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)             7.0633     0.1461  48.331  < 2e-16 ***
recclass_groupedIron    2.6166     0.2151  12.166  < 2e-16 ***
recclass_groupedOther  -0.1283     0.1186  -1.082    0.279    
recclass_groupedStony  -1.0737     0.1837  -5.844 5.11e-09 ***
log10(mass_g)          -1.4212     0.0315 -45.113  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9719.2  on 38094  degrees of freedom
Residual deviance: 7059.9  on 38090  degrees of freedom
AIC: 7069.9

Number of Fisher Scoring iterations: 7

Interpretation:

The logistic regression highlights that meteorite classification and size are strong predictors of fall type. Iron meteorites show much higher odds of being observed falling, while Stony meteorites are significantly less likely. Mass also plays a key role: smaller meteorites are more often recorded as “Fell,” whereas larger ones tend to be discovered later as “Found.” This model provides clearer explanatory power than the linear regressions, directly linking meteorite characteristics to the likelihood of observation. So our regression equation may be written as :

logit(p)=7.0633+2.6166⋅D_Iron−0.1283⋅D_Other−1.0737⋅D_Stony−1.4212⋅log⁡10(mass_g)

where each D takes a 0 or 1 value and logit(p)=ln⁡(p/(1−p)).

Odds Ratio (Chondrite baseline)

# Odds ratios with confidence intervals
exp(cbind(OR = coef(logistic_model), confint(logistic_model)))

Waiting for profiling to be done...

                                OR       2.5 %       97.5 %
(Intercept)           1168.2749485 882.4702871 1565.4027628
recclass_groupedIron    13.6885852   9.0384407   21.0258750
recclass_groupedOther    0.8795672   0.6933494    1.1040823
recclass_groupedStony    0.3417331   0.2390594    0.4918302
log10(mass_g)            0.2414218   0.2268367    0.2566593

Odds Ratio (Stony baseline):

# Relevel to make "Stony" the baseline
meteorites_clean <- meteorites_clean |>
  mutate(recclass_grouped = relevel(recclass_grouped, ref = "Stony"))

# Fit logistic regression with Stony as baseline
model_stony <- glm(fall ~ recclass_grouped + log10(mass_g),
                   data = meteorites_clean,
                   family = binomial)

summary(model_stony)


Call:
glm(formula = fall ~ recclass_grouped + log10(mass_g), family = binomial, 
    data = meteorites_clean)

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 5.9896     0.1691  35.421  < 2e-16 ***
recclass_groupedChondrite   1.0737     0.1837   5.844 5.11e-09 ***
recclass_groupedIron        3.6903     0.2361  15.627  < 2e-16 ***
recclass_groupedOther       0.9454     0.1504   6.287 3.23e-10 ***
log10(mass_g)              -1.4212     0.0315 -45.113  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9719.2  on 38094  degrees of freedom
Residual deviance: 7059.9  on 38090  degrees of freedom
AIC: 7069.9

Number of Fisher Scoring iterations: 7

exp(cbind(OR = coef(model_stony), confint(model_stony)))

Waiting for profiling to be done...

                                   OR       2.5 %      97.5 %
(Intercept)               399.2382169 288.9030254 560.9287676
recclass_groupedChondrite   2.9262603   2.0332220   4.1830614
recclass_groupedIron       40.0563635  25.2961355  63.9337718
recclass_groupedOther       2.5738425   1.8999497   3.4285646
log10(mass_g)               0.2414218   0.2268367   0.2566593

The re-factored regression equation is:

logit(p)=5.99+1.074⋅D_Chondrite+3.689⋅D_Iron+0.946⋅D_Other−1.4212⋅log⁡10(mass_g)

where each D takes a 0 or 1 value and logit(p)=ln⁡(p/(1−p)).

Odds Ratio (Other baseline)

# Relevel to make "Other" the baseline
meteorites_clean <- meteorites_clean |>
  mutate(recclass_grouped = relevel(recclass_grouped, ref = "Other"))

# Fit logistic regression with Other as baseline
model_other <- glm(fall ~ recclass_grouped + log10(mass_g),
                   data = meteorites_clean,
                   family = binomial)

summary(model_other)


Call:
glm(formula = fall ~ recclass_grouped + log10(mass_g), family = binomial, 
    data = meteorites_clean)

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 6.9350     0.1038  66.795  < 2e-16 ***
recclass_groupedStony      -0.9454     0.1504  -6.287 3.23e-10 ***
recclass_groupedChondrite   0.1283     0.1186   1.082    0.279    
recclass_groupedIron        2.7449     0.1859  14.762  < 2e-16 ***
log10(mass_g)              -1.4212     0.0315 -45.113  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9719.2  on 38094  degrees of freedom
Residual deviance: 7059.9  on 38090  degrees of freedom
AIC: 7069.9

Number of Fisher Scoring iterations: 7

exp(cbind(OR = coef(model_other), confint(model_other)))

Waiting for profiling to be done...

                                    OR       2.5 %       97.5 %
(Intercept)               1027.5763054 840.7818964 1263.2192063
recclass_groupedStony        0.3885242   0.2916672    0.5263297
recclass_groupedChondrite    1.1369228   0.9057296    1.4422742
recclass_groupedIron        15.5628648  10.9237571   22.6639335
log10(mass_g)                0.2414218   0.2268367    0.2566593

The regression equation here is:

logit(p)=6.93+1.075⋅D_Chondrite+2.74⋅D_Iron−0.946⋅D_Stony−1.4212⋅log⁡10(mass_g)

where each D takes a 0 or 1 value and logit(p)=ln⁡(p/(1−p)).

Odds Ratio Interpretation:

The logistic regression results make clear that meteorite classification and mass strongly predict fall type, with Iron meteorites consistently being the most likely to be observed falling. When Chondrites are the baseline, Iron meteorites have an odds ratio of 13.69 (95% CI: 9.04–21.03), nearly fourteen times more likely to be recorded as “Fell.” Re‑leveling the factor confirms this pattern across other comparisons: Iron meteorites are about 40 times more likely to be observed falling than Stony meteorites (OR = 40.06, 95% CI: 25.30–63.93) and about 16 times more likely than Other meteorites (OR ≈ 15.56, 95% CI: 10.92–22.66). In contrast, Stony meteorites are significantly less likely to be observed falling (OR = 0.34, 95% CI: 0.24–0.49), and the “Other” category shows no meaningful difference from Chondrites. Mass also exerts a strong effect: each ten‑fold increase in meteorite size reduces the odds of being observed falling by about 76% (OR = 0.24, 95% CI: 0.23–0.26). Taken together, these results demonstrate that Iron meteorites are consistently and substantially more predictable in terms of observed falls, regardless of the comparison group.

Logistic Regression Plots (with Chondrites as baseline)

# Predicted probabilities
meteorites_clean$pred_prob <- predict(logistic_model, type = "response")

plot(log10(meteorites_clean$mass_g), meteorites_clean$pred_prob,
     xlab = "log10(Mass in g)", ylab = "Probability of Fell",
     main = "Logistic Regression Sigmoid (S) Curves",
     pch = 19, col = "darkblue")

Faceted Logistic Regression Plots

# Fit logistic model
logistic_model <- glm(fall ~ recclass_grouped + log10(mass_g),
                      data = meteorites_clean, family = binomial)

# Predict probabilities
meteorites_clean$pred_prob <- predict(logistic_model, type = "response")

# Split data by classification group
groups <- split(meteorites_clean, meteorites_clean$recclass_grouped)

# Set up plotting area: 2 rows, 2 columns
par(mfrow = c(2, 2))

# Loop through each group and plot
lapply(names(groups), function(group_name) {
  group_data <- groups[[group_name]]
  with(group_data, {
    plot(log10(mass_g), pred_prob,
         main = paste("Group:", group_name),
         xlab = "log10(Mass in g)",
         ylab = "Probability of Fell",
         pch = 19, col = "darkblue")
    
  })
})

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

Interpretation

When Chondrites are taken as the baseline, the probability curves reveal clear differences among the four meteorite groups. Iron meteorites consistently show the highest likelihood of being observed falling, with their curve remaining elevated across the mass range, which reflects their strong positive odds ratio relative to Chondrites. Stony meteorites, by contrast, display the lowest probabilities, dropping sharply even at smaller masses, underscoring their reduced chance of being witnessed during descent. The Chondrite curve itself represents the reference pattern, showing a moderate decline in fall probability as mass increases. The “Other” group closely follows the Chondrite baseline, with only slight downward shifts, indicating no meaningful difference in fall likelihood. Together, these curves highlight Iron meteorites as the most observable class, Stony meteorites as the least, and Chondrites and Other types as broadly similar.

Conclusion of Statistical Analysis

The regression models and probability plots converge on the same finding: meteorite classification, particularly Iron, is the strongest predictor of whether a specimen is observed falling. Logistic regression results showed odds ratios well above one for Iron meteorites, while the probability curves remained consistently elevated across the mass range. In contrast, Stony meteorites displayed sharply reduced probabilities, and Chondrites and Other types tracked closely together with little distinction from the baseline. These results reinforce prior research noting the durability and distinctive composition of Iron meteorites, which make them more likely to be recovered and recorded as falls [3][4]. Taken together, the statistical evidence highlights meteorite class as a decisive factor in predicting fall observations, with Iron standing out as the most observable group.

NB: many carbonaceous and enstatite chondrites were grouped into ‘Other’ due to regex limitations, so the proportion of chondrites is lower than in published fall statistics.

Data Visualizations

1. Faceted Violin–Boxplot Comparison of Meteorite Mass by Class and Fall Status

library(RColorBrewer)


#a palette with 4 colors 
cols <- brewer.pal(4, "Set2")

# Compute sample sizes per group and fall status
counts <- meteorites_clean |>
  group_by(recclass_grouped, fall) |>
  summarise(n = n(), .groups = "drop")

ggplot(meteorites_clean, aes(x = recclass_grouped, y = log10(mass_g), fill = recclass_grouped)) +
  geom_violin(trim = FALSE, alpha = 0.6) +
  geom_boxplot(width = 0.2, color = "black", fill = "white", outlier.size = 0.5, alpha = 1) +
  # Place sample size labels near the bottom
  geom_text(data = counts,
            aes(x = recclass_grouped,
                y = min(log10(meteorites_clean$mass_g)) - 0.2,   # just below the lowest values
                label = paste0("n=", n)),
            inherit.aes = FALSE, color = "black", size = 3) +
  scale_fill_manual(values = cols) +
  facet_wrap(~ fall) +
  labs(title = "Meteorite Mass Distributions by Class and Fall Status",
       x = "Classification Group",
       y = "log10(Mass in g)") +
  theme_minimal() +
  theme(legend.position = "none",
        strip.text = element_text(face = "bold"))

NB: many carbonaceous and enstatite chondrites were grouped into ‘Other’ due to regex limitations, so the proportion of chondrites is lower than in published fall statistics.

Interpretation:

This faceted violin–boxplot visualization illustrates the distribution of meteorite masses across four classification groups, separated by fall status (“Fell” vs. “Found”). The vertical axis shows the logarithm of mass in grams, allowing for clearer comparison across the wide range of observed values. Each violin plot represents a smoothed density estimate, where the width of the shape at any given vertical position reflects the concentration of meteorites at that mass level. Wider sections indicate more frequent values, while narrow regions suggest sparsity or outliers. Embedded within each violin is a boxplot that summarizes key distributional features: the central line marks the median log mass, the box spans the interquartile range (middle 50% of the data), and the whiskers extend to capture the bulk of the remaining values. Outliers beyond the whiskers are plotted individually. This dual-layer design provides both a detailed view of distribution shape and a concise summary of central tendency and spread, making it especially effective for comparing groups with differing sample sizes and skewness.

2. Global Map of Meteorites Over 5 kg by Class and Fall Status

library(leaflet)



# Filter and transform
filtered_data <- meteorites_clean |>
  filter(mass_g > 5000, !is.na(reclat), !is.na(reclong), !is.na(year)) |>
  mutate(
    mass_kg = round(mass_g / 1000, 1),
    year_fall = year
  )

# Palette based on filtered data
cols <- brewer.pal(4, "Set2")
pal <- leaflet::colorFactor(palette = cols, domain = filtered_data$recclass_grouped)

# Build interactive map
leaflet(filtered_data) |>
  addProviderTiles("Esri.WorldImagery") |>
  addCircleMarkers(
    ~reclong, ~reclat,
    color = ~pal(recclass_grouped),
    popup = ~paste0(
      "<b>Name:</b> ", name,
      "<br><b>Class:</b> ", recclass_grouped,
      "<br><b>Mass (kg):</b> ", mass_kg,
      "<br><b>Fall:</b> ", fall,
      "<br><b>Year:</b> ", year_fall
    ),
    radius = ~ifelse(fall == "Fell", 8, 4),   # Fell larger, Found smaller
    stroke = FALSE, fillOpacity = 0.7
  ) |>
  addLegend("bottomright", pal = pal, values = ~recclass_grouped,
            title = "Classification Group")

This interactive map is designed to show the global distribution of meteorites over 5 kg, separated by classification group and fall status. By plotting each meteorite at its recorded latitude and longitude, the visualization allows you to explore spatial patterns—such as whether certain types are more often found in particular regions or whether “Fell” meteorites cluster differently than “Found” ones. The circle markers vary in size to distinguish fall status (larger for “Fell,” smaller for “Found”), and colors indicate the classification group.

The tooltips (pop‑ups) provide detailed information for each meteorite when you click on a marker. They include the name of the meteorite, its classification group, its mass in kilograms, whether it was observed falling or later found, and the year of discovery or fall. This makes the map not only a geographic overview but also a quick reference tool for individual meteorite records.

Project Summary

This project examined meteorite data to explore how classification, mass, and fall status interact in shaping the likelihood of a meteorite being observed falling versus later found. The violin–boxplot visualizations revealed clear differences in mass distributions across classes, with Iron meteorites tending toward higher masses and Chondrites and Stony meteorites showing broader spreads of smaller specimens. The interactive leaflet map extended this analysis spatially, highlighting the global distribution of meteorites over 5 kg and allowing exploration of individual records by class, mass, fall type, and year. Together with the logistic regression models and probability curves, these visualizations reinforce the conclusion that meteorite class—particularly Iron—is a decisive predictor of observed falls.

References:

[1] NASA. (2013). Meteorite Landings. NASA Open Data Portal. Retrieved from https://data.nasa.gov/dataset/meteorite-landings

[2] Meteoritical Society. (2025). Meteoritical Bulletin Database. Retrieved from https://www.lpi.usra.edu/meteor/

[3] Kring, D. A. (2025). Meteorites and Their Properties – Frequency of Falls. Lunar and Planetary Institute. Retrieved from https://www.lpi.usra.edu/science/kring/epo_web/meteorites/falls.html

[4] Goldstein, J. I., Scott, E. R. D., & Chabot, N. L. (2025). Iron Meteorites: Composition, Age, and Origin. Oxford Research Encyclopedia of Planetary Science. Retrieved from https://oxfordre.com/planetaryscience/display/10.1093/acrefore/9780190647926.001.0001/acrefore-9780190647926-e-206

[5] Dudorov, A. E., & Eretnova, O. V. (2020). The Rate of Falls of Meteorites and Bolides. Solar System Research, 54(3), 223–235. Retrieved from https://link.springer.com/article/10.1134/S003809462003003X