Is Earth that much of an anomaly in the galaxy or are there planets which are extremely similar to Earth that we could be very close relatives to? The dataset I will use to answer this question is from the NASA Exoplanet Science Institute. The dataset is titled ‘Transiting Exoplanets’ and is made up of 35,774 observations of planets outside our solar system, often times hundreds of light years away (Christiansen et al. 2025). The dataset compiles information on the planet’s location, distance from it’s star, it’s orbital period, orbital parameters like eccentricity and inclination, as well as information on the planet’s size. All of this information can be compared to Earth to find Earth-like exoplanets based on data alone.
To properly analyze the Transiting Exoplanets dataset we will have to
first clean out the things we do not need. To clean the data we will
select the following columns from the whole: Planet Radius,
Orbital Period, Stellar Radius, Stellar Effective Temperature, Ratio of
Planet to Stellar Mass, Inclination, and Eccentricity
# Here we have to clean up the dataset using R functions
clean_exoplanets <- exoplanets |>
select(c("pl_name", "pl_orbper", "pl_rade", "pl_orbeccen", "pl_orbincl", "pl_ratror", "st_teff"))
head(clean_exoplanets)
## # A tibble: 6 × 7
## pl_name pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 55 Cnc e 0.737 NA 0.032 NA NA 5292
## 2 55 Cnc e NA NA NA NA NA 5250
## 3 55 Cnc e 0.737 2.17 0 NA NA 5250
## 4 55 Cnc e 2.80 NA 0.264 NA NA 5250
## 5 55 Cnc e 2.81 NA 0.174 NA NA NA
## 6 55 Cnc e NA 1.92 NA 83 0.0187 5250
Here we face a dilemma, we need to handle all of the NA values but
planets vary extremely widely, so the data needs to be summarized by
each of the planet names using
summarize(across(everything(), ...)) and grouping by
pl_name. We also have to remove any rows where there was no
data for some of the variables. The rows where NaN exists have
absolutely no data for one of the columns we require, so we need to
clean that up right afterward. Unfortunately this has to be done by
completely omitting via na.omit() for these rows because
they are not useful whatsoever.
# Creature a dataset of summarized exoplanets with the averages for each planet
summary_exoplanets <- clean_exoplanets |>
group_by(pl_name) |>
summarize(across(everything(), mean, na.rm=TRUE)) |>
na.omit(sumamry_exoplanets)
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(everything(), mean, na.rm = TRUE)`.
## ℹ In group 1: `pl_name = "55 Cnc e"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
head(summary_exoplanets)
## # A tibble: 6 × 7
## pl_name pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 55 Cnc e 0.954 1.97 0.0695 86.0 0.0188 5243.
## 2 AU Mic b 8.46 4.15 0.0652 89.2 0.0499 3664.
## 3 AU Mic c 18.9 2.78 0.0709 89.3 0.033 3664
## 4 BD+20 594 b 41.7 2.39 0 89.5 0.0223 5745
## 5 BD-14 3065 b 4.29 21.2 0.066 80.8 0.0843 6836.
## 6 CoRoT-1 b 1.51 17.7 0.0267 85.4 0.138 6008
Now our summary dataset has 4,557 individual planets that we can compare to our home planet once we know our home planet’s statistics. The statistics we need for Earth’s data are:
[Earth, 365.256, 1, 0.0167, 7.155, 0.00968545781228, 5778]
(Clabon Allen 2000, J.L. Simon et al 1994, arXiv:1510.07674)
# Adding Earth Data
summary_exoplanets %>% add_row(pl_name="Earth", pl_orbper=365.256, pl_rade=1, pl_orbeccen=0.0167, pl_orbincl=7.155, pl_ratror=0.00968545781228, st_teff=5778, .before = 0)
## # A tibble: 3,835 × 7
## pl_name pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Earth 365. 1 0.0167 7.16 0.00969 5778
## 2 55 Cnc e 0.954 1.97 0.0695 86.0 0.0188 5243.
## 3 AU Mic b 8.46 4.15 0.0652 89.2 0.0499 3664.
## 4 AU Mic c 18.9 2.78 0.0709 89.3 0.033 3664
## 5 BD+20 594 b 41.7 2.39 0 89.5 0.0223 5745
## 6 BD-14 3065 b 4.29 21.2 0.066 80.8 0.0843 6836.
## 7 CoRoT-1 b 1.51 17.7 0.0267 85.4 0.138 6008
## 8 CoRoT-10 b 13.2 9.92 0.527 88.6 0.127 5062.
## 9 CoRoT-11 b 2.99 17.1 0.12 83.2 0.105 6408.
## 10 CoRoT-12 b 2.83 16.0 0.06 85.6 0.149 5675
## # ℹ 3,825 more rows
ggplot(summary_exoplanets, aes(x=pl_orbper, y=pl_rade, color=st_teff)) +
geom_point(size=1) +
labs(
title="Transiting Exoplanets by Orbital Period, Radius, and Solar Effective Temperature",
x="Orbital Period in Days",
y="Radius in Earth Radii",
color="Solar Effective Temperature"
) +
theme_dark()
This graph is zoomed out very far so lets limit it to similar radius to Earth and regraph.
earthlike_exoplanets <- summary_exoplanets |>
filter(pl_rade < 1.25) |>
filter(pl_rade > 0.75)
ggplot(earthlike_exoplanets, aes(x=pl_orbper, y=pl_rade, color=st_teff)) +
geom_point(size=1) +
labs(
title="Transiting Exoplanets by Orbital Period, Radius, and Solar Effective Temperature",
x="Orbital Period in Days",
y="Radius in Earth Radii",
color="Solar Effective Temperature"
) +
theme_dark()
From the graphing above, we can see that there are very, very, very few exoplanets that show similar characteristics to earth. Even when filtering the dataset down to just 386 exoplanets, the majority of the candidates are far too close to their sun (seen by a lower orbital period). Overall, there are nearly zero candidates that are similar to Earth by solar effective temperature, radius, and orbital period. If we wanted to move off of our planet to another solar system, we would have to worry about high temperatures, low temperatures, too much or too little gravity, or seasons that last days instead of months.
These results hint at the fact that Earth is extremely special, is almost entirely on its own based on what we have seen, and needs to be cherished because there aren’t any alternatives that we can just get up and go to. In relation to the question posed, yes, Earth is a rarity based on what we have seen, and astronomers can say that Earth is at the very least, a one in 3834 occurrence. This number will only get smaller as we find more planets, and even if we find a new Earthlike exoplanet, we can’t say for sure that we can get there because it could be hundreds, thousands, or millions of lightyears away.
Obstacles with researching Earthlike exoplanets were mainly based on what we should consider Earthlike. I selected planets based on their radius and orbital period first, which yielded a total of zero results. When I narrowed it down to just based on radius, I had to move to within a whole quarter Earth Radius to get a significant number of planets to graph. This is not the fault of the dataset, rather it is a proof of how extremely rare a planet like Earth is.
In the future, we have to play the long game. Having identified only about 6150 total exoplanets according to NASA’s Exoplanet Science Institute (Christiansen et al.) we still have many millions more to go. There simply isn’t enough existing data to say for sure that Earth is alone however as we find more data we can narrow down the precise conditions for a planet like ours being created. All in all, the future needs to hold more data, and with more data, scientists can find better candidates, better limitations to the data, and potentially, another Earth.
Jessie L. Christiansen et al 2025 Planet. Sci. J. 6
186
J.L. Simon et al 1994 Astronomy and Astrophysics. 282 (2):
663–683.
Clabon Walter Allen, Arthur N. Cox 2000 Allen’s Astrophysical
Quantities, Springer, p. 294.
This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.